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Abstract 

This paper analyzes the relation between public, educa- 
tion-related infrastructure and the quality of education in 
schools. The analysis uses a case study of the establishment 
of two large, high-quality public libraries in low-income 
areas in Bogota, Colombia. It assesses the impact of these 
libraries on the quality of education by comparing national 
test scores (SABER 1 1) for schools close to and far from the 
libraries before (2000-02) and after (2003-08) the librar- 
ies were opened. The paper introduces a Blinder-Oaxaca 


decomposition on difference-in-differences estimates 
to assess whether variation of traditional determinants 
of mathematics, verbal, and science test scores explains 
the estimates. The analysis finds differences that are 
not statistically different from zero that could be attrib- 
uted to the establishment of the libraries. These results 
are robust to alternative specifications, a synthetic con- 
trol approach, and an alternative measure of distance. 
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1 Introduction 


Facilitating public access to information, the traditional primary function of libraries, is 
being challenged by the information revolution. However, public libraries serve multiple 
functions beyond their role in disseminating materials. A big movement of public library 
construction undertaken in the developing world reflects these functions by emphasizing 
libraries as the center of social transformation in deprived slums, providing the general 
population, especially the less well-off, with access to meeting spaces, cultural activities, 
technology, and information services, among others. For example, impressive (and expen- 
sive) , massive public libraries were constructed in the most impoverished areas of Medellin 
(Colombia), in zones with high criminal rates, and Bogota (Colombia). These libraries are 
not only places where you can find books or magazines for free, but also places offering a 
wide range of services which are intended to motivate the general public towards culture 
and education and, ultimately, to change the living conditions of the people. 

The goal of this paper is to establish the impact on the quality of education of the 2001 
construction of two of these massive libraries (from here on mega-libraries) in the city of 
Bogota (Colombia). Even public schools provide services to a selected group of students, 
thus they can be considered a private asset in a sense. Public libraries, however, are 
available to students from different schools. Thus, this study will tell us something about 
the possible effect of truly public, education-related infrastructure on quality of education. 
It is also possible to assess latent complementarities between public (libraries) and private 
(schools) educational services in enhancing quality education by estimating the effect of 
libraries on the returns that certain school characteristics have on education. In other 
words, the paper studies how public libraries affect the quality of education and to what 
extent this could be through the enhancement of services provided by schools. 

This paper contributes a new perspective to the literature on the determinants of qual- 
ity of education. This literature is generally limited to the use of private characteristics 
from the school and from the family to explain differences in student performance. By 
widening the perspective of determinants beyond the walls of the school and the house, 
this paper contributes to the education literature, looking towards public goods that are 
around the schools and which could be used to enhance the impact of schools’ inputs. At 
the same time, considering that the main objective of libraries is not their direct influence 
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on quality of education in schools, this paper contributes to the urban economics litera- 
ture by analyzing the existence of externalities and complementarities between this kind 
of public infrastructure and schools or households near to the libraries. 

The causal effect of access to public libraries on student academic performance is as- 
sessed using a Difference-in-Differences (DiD) methodology, combined with propensity score 
matching as a robustness test of the results. The procedure takes advantage of the spatial 
location of the libraries with the first, El Tunal constructed on the grounds of a public 
park, and the second, El Tintal, in an old garbage processing plant. We compare the 
average results on standardized test scores at the end of secondary level studies of schools 
(SABER 11) close to the libraries and those far from them from 2000 to 2008, that is, 
before and after the libraries’ opening. This concept is implemented under both paramet- 
ric and nonparametric specifications of the relationship between distance to the library 
and test scores. We also implement Oaxaca-Blinder decomposition of the impact of the 
program on the quality of education to explore the possible improvement via the variation 
of traditional inputs of education quality. 

Given our specification, we are considering both the direct and indirect impacts that the 
libraries could have on student performance. Direct impact might come from the possibility 
that students living close to libraries access library services and programs independently 
or that nearby schools deliberately take advantage of the library for their own activities. 
Indirect effects might come from the impact of the renovation of the public infrastructure 
on the area which could improve crime perceptions, the general mood of the population, or 
other neighborhood effects. Due to the lack of information on students’ actual residences 
or on specific school programs which take advantage of the libraries, we cannot assess these 
channels separately. 

Our main results show that while the relationship shows the expected positive sign, 
results are not statistically significant. This either tells us that the libraries are not fully 
exploited by schools or that the possible gains are concentrated among particular types of 
individuals. This opens the question of how aligned incentives are to foster cooperation 
between schools and public libraries in order to improve the quality of education. Perhaps 
it is not enough to construct beautiful and well-equipped libraries that are near to schools; 
a second generation of policies might be required to enhance the coordination between these 
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libraries with the current educational environment of neighborhood schools and households. 

The remainder of this paper is organized as follows: Section 2 discusses the theoretical 
links between libraries and quality of education. Next, Section 3 describes the program 
and its context, Section 4 presents data on quality of education and other controls. Section 
5 discusses the identification strategy and decomposition of the effect, Section 6 presents 
the results and Section 7 concludes. 

2 Libraries and academic performance 

Vegas and Petrow (2008) classify determinants of education into demand-based and supply- 
based components. Both groups include tangible and intangible inputs defined by students’ 
access to private facilities or their environments. For instance, on the demand side, im- 
portant inputs include an environment, defined by parental characteristics, that promotes 
study (Fertig and Schmidt, 2002; World Bank, 2005) and the availability of educational 
resources in the household, like books or well used internet (Murnane et ah, 1981; Gamboa 
et al., 2010; Blomeyer et ah, 2009). On the supply side, libraries are included as physical 
infrastructure along with other, intangible, inputs which are generally considered more im- 
portant, such as educational policy which incentivizes competence in schools and teacher 
quality (Hanushek and Wofimann, 2007). 

Focusing on the impact of libraries on education beyond the ’infrastructure’ component 
of schools, Lance (1994) in a largely descriptive study of improvements in school perfor- 
mance that are associated with libraries in Colorado, shows a relationship between the 
availability of libraries and specific skills such as reading, writing and critical thinking. 
Similar relationships are discussed in Lance and others’ further research of libraries in the 
United States (Lance, 1994; Lance et al., 2000; Rodney et al., 2002) and the United King- 
dom (Williams et al., 2001). Lonsdale (2003) provides a review of studies linking libraries 
to educational outcomes, such as Smith (2001) which argues that libraries improving school 
performance by 4%. However, this literature does not involve a causal analysis; it rest on 
correlation and qualitative analysis. 

In terms of proper causal analyses, few in the literature analyze libraries themselves. 
The most relevant literature analyzes the impact of programs which make learning materials 
more available in schools on educational outcomes. These learning materials, a traditional 


4 


part of library services, are: textbooks (Glewwe et al., 2009), flipcharts (Glewwe et al., 
2004) and computers in schools (Barrera-Osorio and Linden, 2009). Across programs, each 
with its own particularities, no authors find impact of the respective learning material on 
the quality of education received by the average student. 1 . However, these evaluations do 
not consider the joint effect derived from the interaction of these learning materials, an 
effect that could be captured in an analysis of public libraries given that these institutions 
provide learning materials simultaneously. 

Borkurn et al. (2013) is the only study found that explores the role of libraries on 
educational outcomes. In an evaluation of an educational program in Bangalore, India 
that provides high quality libraries to public primary schools, the authors find no impact 
of school libraries on scores of different subjects and on dropout rates. Given that this study 
does not consider public libraries and, most importantly, the type of public libraries that 
we are considering (mega-libraries), the present study is the first that presents evidence 
on causality between public mega-libraries 2 on educational outcomes within impoverished 
areas in a developing country. 

We propose that the production function of education quality for school i, Y { , in urban 
areas includes not only the demand characteristics that it faces, X\ j, and private supply (in 
this case, schools) characteristics, X 2 ,i, but also the benefit from public, education-related 
facilities Zj (equation 1). This additional input acts as a complement to the education 
provided by schools. Assuming that these institutions do have a positive impact on the 
skills related to test-scores of their users, the relationship between Z and Y might vary 
according to the interaction between both the demand and supply elements related to using 
the public, education-related facilities. In other words, the impact of public, education- 
related facilities on quality of education depends on the degree to which both families 
directly use them and schools facilitate their use. 3 Let us consider two examples: first, for 
school managers who obtain more benefits for promoting activities related to a particular 
public facility than others, Z might be larger; second, families living far from public facilities 

1 In an evaluation of the impact of textbooks on student achievement, Glewwe et al. (2009) finds a 
localized positive effect on those students who already had relatively high achievement 

2 Mega-libraries are not just large buildings full of learning materials but represent a catalyst for rede- 
velopment of urban zones and repositories of new public spaces. 

3 Positive returns to higher levels of school quality based on facility use in Colombia are expected for 
families (Gamboa and Rodrfguez-Lesmes, 2014). However, it is not clear that all schools have the same 
incentives (Gaviria and Barrientos, 2001). 
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are less likely to benefit from them due to credit or time constraints, which will be reflected 
in a lower value of Z than for those who live close by. 

Yi = f(X hi , X 2ii , Zi(X lii: X 2>i )) (1) 

Our data are limited by only one kind of public, education-related facility (the mega- 
libraries) to calculate Z and a we do not have information about the relation between 
schools-households and libraries, so we cannot disentangle the relationship between Z and 
Y at the level of detail just explained. Given these data restrictions, our data will use the 
proximity of schools to the libraries as a proxy of Z. 

In order to link the relation between the schools and libraries we use as measure of 
intensity the distance between both. That is, we will identify the difference 5 of being close 
rather than far to the public facility based on assigning a discrete value of T = 1, if a school 
is within a close range of a library and T = 0 if the school is outside of this range. Our 
main assumption is that if a school is far enough away from the public facility, its students 
do not receive any benefit from it (Z = 0, as shown in the Equation 2). 

<5 = E[Y i \T = l,X}-E[Y i \T = 0,X\ 

= f(X hi (T = 1 ),X 2ii (T = 1), Z(X lti (T = 1 ),X 2ti (T = 1))) 

-f(X 1:i (T = 0 ),X 2>i (T = 0), Z{X\ i{T = 0 ),X 2ii (T = 0))) 

= f(X hi (T = 1 ),X 2ti (T = 1), Z(X lti (T = 1 ),X 2ti (T = 1))) 

-f(X lii (T = 0),X 2ti (T = 0),0)) (2) 

3 BibloRed program and Colombian schools 

BibloRed is a program which Bogota’s local administration designed in 1998 and opera- 
tionalized by the end of 2001. The idea was to allow the general population to get access to 
information services and reading and writing resources. However, the program also seeks 
to foment cultural growth and promote research. In the first stage, the operation started 
with 3 major libraries (El Tunal , El Tintal and the Virgilio Varco ), 15 minor libraries and 
1 bibliobus ; almost ten years later another major library started operations ( Julio Mario 
Santodomingo) . Each major library has an area of around 10,000 square meters, 150,000 
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volumes and 600 reader seats (Tolosa, 2012). Information services not only include books 
and magazines, but also children’s rooms with specialized staff, programs for babies and 
their parents, activities for teens, workshops in literature, puppets, etc. The intention 
is to attract the public with these activities while integrating education into them. One 
of the main projects occurs over holidays, when BibloRed implements Bibliovacaciones, a 
program with the activities mentioned plus cost-free art, history and literature exhibitions 
such as theater plays and films. In this context, it is evident that these libraries have 
many activities which enhance the quality of life, particularly through their integration 
of culture; thus, the possible effect on the educational performance of children and young 
people is just one of the multiple benefits that libraries bring to society. 

Since it is not possible to have information on which of the test-takers actually use 
the libraries, we propose to use the distance of libraries to their schools as an alternative 
indicator for treatment status. As discussed in the previous section, this rests on the 
assumption that the use of libraries is likely to be higher for those living closer than for 
those who live far, supported by travel costs to libraries incurred by the latter which 
reduce students’ incentives to visit them frequently. According to Table 1, 77% of students 
in Bogota live less than 20 minutes from the school they attend. As a result, it is a fair 
assumption that distance from school to the library approximates the distance from the 
library to students’ residence and, therefore, the likelihood that they live in an environment 
affected by libraries. 

The Euclidean distance between the school and the local library is shown in Figure l.We 
calculate it based on the information on the spatial location of each school as specified by 
Bogota’s Department of Education. Alternatively, we use road-based distances as shown 
in Figure 2. 4 Figure B.l presents the link between both distances. As expected, the road- 
based distances all fall above the blue line corresponding to the 45-degree line. The black 
dotted line is the predicted linear relationship between both measures, which captures up 
to 80% of the total variation. As a robustness check, the main estimators are repeated 
using the fitted distance. 5 

4 These calculations were made using ESRI ArcMap 10.2 Closest Facility Analysis. The road network 
was obtained from Open Street Map project (OSM). 

5 More explicitly: AdjustedRD = RD ^° ; where /3 come from the OLS regression between road distance 
RD and euclidean one ED\ RD = /3o + PiED + u 
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Figure 1: Libraries and treatment status allocation: euclidean distance 



Figure 2: Libraries and treatment status allocation: road distance 



El Tintal and El Tunal libraries are located in middle-low income zones, where most of 
the students attend nearby schools. Schools near to Virgilio Barco and Julio Mario Santo 
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Domingo are populated by, on average, wealthier families which are more likely to live far 
from school and use private transport for the daily commuting. If we include the last two 
libraries, our approximation of taking the distance between the library and the school to 
represent the treatment status will not be accurate. As a result, we decided to include only 
El Tintal and El Tunal libraries in this analysis. 

In Colombia, schools can be classified according to four important characteristics that 
are closely related with the quality of education in the literature. These characteristics 
are: whether the school is managed by the government, the proportion of females to males 
attending the school, the start of the academic year and the length of the school day. In 
regards to the first characteristic, most of the students who would demand the services 
of libraries are part of the government-managed education system. Public schools are 
free at the primary level and have low tuition fees at the secondary level, but provide a 
lower quality of education than private schools (Nunez et ah, 2002). 6 In regards to the 
second characteristic, the fact that some parents may prefer specific types of education 
such as religious institutions or gender-specific schools could be correlated with demand- 
side factors. With respect to the start of the academic year, schools can be calendar A or 
calendar B, which means they start in January or August, respectively. While calendar 
A is the norm, calendar B schools are typically private institutions usually designed in 
order to follow European or US schedules. This typically means that calendar B schools 
have higher test scores due to the strong selection related to the high income of students’ 
families. Finally, schools can serve students for a full school day (12 hours) or implement 
double-shifts, with some students coming in the morning and others in the afternoon. ' 
Double-shifting is usually associated with lower academic results in the Latin American 
context as documented by Bonilla-Mejfa (2011). 

6 A small number of public schools are managed by the private sector and seem to follow a different 
pattern (Sarmiento et al., 2005). None of them is close enough to our libraries. 

7 Other schools include night shifts or weekend shifts, but we will not consider them. Typically, these 
institutions are intended for young adults, who want to finish their secondary education after dropping out, 
thus the education incentives and the environment is totally diferent from a typical student. 
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4 Data 


4.1 Quality of education data 

Our measure of education quality is the Colombian equivalent to the SAT, the SABER 11 
test administered by the ICFES (Colombian Institute for Evaluation of Education) which 
is part of the Ministry of Education. It includes a comprehensive evaluation of different 
areas of knowledge, specifically mathematics, verbal and sciences (biology, physics and 
chemistry). The test is carried out twice per year due to the existence of two main school 
calendars, and, though it is not compulsory for graduation, it is an entry requirement by 
universities in order to use it as a common filter for selecting their new students. In order 
to ensure comparability, test results are standardized by wave at the Bogota level in each 
one of the described subject areas and an average is taken of the scores (called here the 
general result). 

Tables 3 and 2 show average, standardized test scores of schools according to their 
characteristics including only the universe of schools used in the estimation, specifically, 
Bogota schools located within a 3.5 Km range around the libraries as shown in Figure 
1. Table 3 shows that students attending schools with a full-day schedule score higher, 
on average, than students attending double-shift schools. Among the latter, the students 
attending school in the morning score higher, on average, than those attending schools in 
the afternoon. This is related to the management of the school: students attending those 
managed by the government typically do worse than those managed by the private sector, 
which are normally private institutions. These relationships are stable over time and a 
common factor in the Colombian quality of education literature (Gaviria and Barrientos, 
2001). Table 2 shows that there are also differences in test scores between students who 
attend different types of schools in terms of school size, the teacher-student ratio, the 
female-male student ratio, and teacher education level. These are all traditional inputs of 
education that we will discuss further in the next section 

Table 4 shows a U-shape relationship between school quality and distance to the li- 
braries. Schools close to the libraries are normally better than those at a medium-range 
distance (1 Km - 2.5 Km), but worse than or similar to those far away (2.5 Km - 3.5 
Km). As this relationship might be driven by the allocation of inputs, our next section 
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will analyze them in more detail. 


4.2 Other variables and data restrictions 

In order to take into account other sources of variation that might be correlated with 
distance to the libraries, we take into account variables that the literature has identified 
as key determinants of the quality of education. Variables used to control for institutional 
characteristics come from the C600 (a registry of students and school staff) and C100 (a 
registry of school infrastructure) from the Ministry of Education. Neighborhood controls 
are derived from the General Population Census of 2005 conducted by DANE (national 
statistics department). The relationship of these variables to our measures of quality of 
education is described in Table 3. 

Though C100 information is only available starting from 2002, it provides valuable 
information on the physical infrastructure of schools. It includes data on sports facilities, 
the presence of a school library and a measure of the quality of educational assets, a dummy 
which is one if the school has simultaneously computer, physics and chemistry labs. From 
the C600 form we introduce several time- varying variables per school which are related to 
the supply-side of quality of education. First, we take into account the number of students 
per school in a logarithmic scale and the teacher-pupil ratio of the school. Larger schools 
are correlated with better results. To provide us with an idea of the overall quality of 
the facilities, we include the area in squared meters of classrooms and sport facilities per 
student. We also take into account the proportion of teachers with a graduate degree as a 
proxy of their human capital. As the public sector incentivizes the concentration of teachers 
with more qualifications, its relationship with quality seems to be negative as described by 
Nunez et al. (2002). Gender differences might be relevant, so we include the proportion of 
female students and teachers. Finally, we include some controls specific to the examined 
cohort: its size and the ratio of female test-takers. This data were cleaned by removing 
schools with a teacher-student ratio greater than 0.5 (one teacher for every two students) 
or equal to 0 (no teacher to student) as these ratios indicate that the data may contain 
errors. 

Finally, neighborhood-level controls are available at the census block level from 2005. 
We averaged the information of the blocks which were at least 50 meters from the school. 
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These controls are the average age and the share in the block of the population who are 
students, who have at most primary education, who immigrated from other municipalities 
and from rural areas during the last 5 years, who are of working age, who are working or 
looking for a job and who fasted for one week. 

Tables 5 and 6 report for different ranges of distance to the library (column 1) the 
number of schools-students (column 2) and the number of schools-students used in the 
model (column 3), respectively. 8 The difference between columns two and three are due to 
information gaps in C600. Hot Deck imputation methodology was used to minimize the 
number of missing, following the implementation of Baez and Buitrago (2010) based on 
Nopo (2008) idea about donors and receptors. 

4.3 Test scores and distance to the libraries 

After observing the data on the relationships between some features of the campus and 
the quality of education, and considering the causal impact that the literature attributes 
to these features, it is prudent to identify whether the location of the libraries is correlated 
with the type of schools. Table 7 addresses this question by calculating the average charac- 
teristics of schools that are located in different ranges from the nearest mega-library. The 
main observation is that the nearest schools are more likely to be public. As public schools 
tend to have lower test scores (Nunez et ah, 2002; Gaviria and Barrientos, 2001), the cor- 
relation between education quality and the distance of the libraries is negative. A first 
approach to the impact of libraries on test-scores score is to explore the score-distance re- 
lationship after deducting the impact of variation of common determinants from the score. 
For this, we turn to a classic semi-parametric model. A partial linear regression allows us 
to see a non-linear relationship as presented in Equation 3. 9 In it, Y is the score, X is 
the controls, u is an error such that E[u\d, X] = 0 . Figure B.2 shows the estimates m(d), 
which gives the relationship between the score and the distance variation by discounting 
usual controls. 


Y = m(d) + X(3 + u (3) 

8 In the case of public institutions with a school is considered as the combination seat-day. 

9 The estimation was performed following the algorithm differences Yatchew (1997), implemented by 
Lokshin (2006). 
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We found a U-shaped relationship where the minimum is centered near 1500 meters. As 
a result, our analysis will be particularly focused on schools located between 750 and 2000 
meters from libraries, where the impact of libraries is likely to reach. However, these graphs 
are used just to explore the relationship, because they include unobserved determinants u. 
in fact the U pattern is found both before and after 2002. 

To estimate the effect we must assume that unobservable variables can vary across 
the distance, but the time variations of these unobservable variables are not related with 
distance. This restriction allows us to identify the average impact on the schools ‘close’ to 
the libraries compared to those that are ‘distant’ and supports the motivation to use the 
DiD strategy, as will be discussed in the next section. 

5 Empirical Strategy 

The impact of libraries on quality of education is identified using the Difference in Difference 
(DiD) method. We define the schools ‘near’ to the libraries as treated, and those ‘far’ 
from them as controls. That is, we are assuming that any difference between these two 
groups of schools would have been preserved if no libraries were constructed (parallel 
trends assumption). It is important to remember that in these cases the ‘libraries’ refer to 
the entire intervention on the public infrastructure and urban planning development that 
occurred in those areas. Thus, the estimation is based on the provision, not the intensity 
of use, of libraries which is assumed to be a function of the distance of the school to the 
physical building. 

The identification strategy involves two stages: the first refers to the measure of the 
magnitude and significance of the impact, and the second is to decompose it into the impact 
due to changes in observed inputs and to variations not linked to those inputs. The de- 
composition addresses the question of complementarities between libraries and traditional 
determinants of the quality of education, in other words, how the libraries enhance the 
impact of traditional inputs already present in schools. 

As described before, our treatment indicator is the spatial proximity from schools. 
However, being ‘near’ or ‘far’ is an arbitrary definition and requires a selection rule that is 
part of the research question. Discrete and continuous options were considered to define 
exposure treatment using the distance of each school to the libraries, d . 
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A first alternative ( continuous approach) is to impose a parametric restriction on the 
relationship between the distance to library and test scores. Given the results from the 
partially-linear regression, it is possible to presume that the impact decreases with the 
inverse of distance up to some far, arbitrary cutoff Rl where we set the impact to be 
exactly 0, including all the schools within a fixed radius R2. Hence, we define T = ^ — 1 
if d < R1 and T = 0 if d > Rl. For this specification we present results for ratios 
Rl £ {1500,2000,2500,3000,3500} and R2 = 3500. 

On the hand, the effect could be discontinuous ( discrete approach). Hence, in order to 
avoid any assumption on the distance-scores’ relation, schools within a certain ratio, R2 is 
assigned into treated T = 1 and control groups T = 0 using an arbitrary distance to the 
library cut-off Rl. This specification, henceforth Discrete I, is represented in Figure 1. An 
alternative, Discrete II, is to omit some schools between treatment and control zones, so 
the control zone starts at R3 £ [R1,R2]. Implementing different cut-offs in the analysis 
did not show substantial differences. We will present results using R2 = 3500, R3 = 2000 
and Rl £ {750, 1000, 1250, 1500, 1750, 2000}. 

5.1 Estimation of the general impact (DiD) 

We define the average treatment effect on the treated 5 T , as the impact on average test 
scores at year r for schools that are located close to the libraries in comparison to those 
that are far from them. If we consider the continuous treatment scenario, the fullest impact 
occurs for schools that are located right next to one of the libraries. This parameter is 
estimated using the classic setup as presented in equation 4. Let Yu be the average test 
scores of school i at year t, Ti the treatment status of each school, At a dummy that is 
1 if t > 2003, 1 (t = t) is an indicator for year r being equal to year t, and fix effects 7 * 
and 7 1 - For this specification, we assume that the parallel trends hold conditional on the 
school- level controls X t f . 

2008 

Yu = S T Ti ■ 1 (r = t) + Pi At + r/Xit + 7 $ + 7t + ea (4) 

r=2003 

The identification assumption might be too strong; schools placed in different areas 
might follow dissimilar trends due to uncontrolled factors. For instance, migration of 
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people with different willingness to spend on education may shape schools’ investments in 
a way that is not captured by our current covariates. In essence, some schools might be 
improving while others worsening. In order to address this, we can include school-specific 
trends 10 , t ■ 7 *, as shown in equation 5. The limitation of this approach is that trends can 
differ only as long as they do so in a linear fashion. 

2008 

Yit = ^ 6 T Ti ■ 1(t = t) + PiA t + r)X it + ^ + it + ■ ~/i + e it (5) 

r=2003 

5.2 Propensity Score Matching and Synthetic Control 

One of the main concerns with the DiD method for studies with limited control units 
is how to choose the best control when there are few treated units, which implies high 
sensitivity of the estimation to the control selection, and when the unit of observation is an 
aggregate (eg. countries, states or schools). Abadie and Gardeazabal (2003) introduced an 
approach known as the ‘synthetic control’ to deal with these problems. The idea is to select 
a set of weights for the control units to construct the parallel trends between outcomes 
before the intervention. However, as is suggested by Abadie and Gardeazabal (2003), the 
synthetic control needs a long period of time prior to the intervention in order to control for 
structural patterns in both observables and non-observables (Abadie et ah, 2010). Given 
that there are just three years available before the implementation of the mega-libraries 
and that the objective is to forecast over the next six years, the synthetic control strategy 
might lead to misleading results. An alternative that might be more suitable is to weaken 
the DiD parallel trends assumption by introducing matching into the pre-treatment period 
(Blundell and Dias, 2009). The matching estimator relies on the minimization of a distance 
function which is increasingly hard to estimate with the number of included covariates. A 
traditional way to simplify this problem, when there is more than one treated unit, is 
to perform the matching based on the predicted likelihood of being a treated unit, the 
propensity score (Rosenbaum and Rubin, 1983). 

In this paper we combine both approaches by implementing kernel propensity score 
matching 11 (Heckman et al., 1997) that includes as controls the pre-treatment evolution 

10 For other applications that introduce this technique, see for instance, Besley and Burgess (2004). 
n The procedure was implemented using psmatch2 (Leuven and Sianesi, 2014) in Stata 12. 
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of test scores, which is in line to the synthetic control matching step. Once the synthetic 
control is constructed by re-weighting the non-treated schools, DiD specifications from 
equations 4 and 5 are applied. 12 

In doing so, the underlying identification assumption changes slightly. Once the ob- 
served covariates are taken into account, and schools close and far from the libraries follow 
similar time-trends or differ in a linear way, estimated impacts can be attributed to the 
mega-libraries. However, keep in mind that the identification will be invalid if there were 
events that were not considered and affected some of the schools (either close or far from 
libraries) and not the others. 

Apart from the 2000-2002 test scores, the matching variables considered are the fol- 
lowing: the proportion of teachers with graduate studies, pupil-teacher ratio, public school 
dummy, morning school day dummy, complete school day dummy, female-teacher ratio, 
11th grade female-male students ratio, 11th grade students, total students, girls-students 
ratio, built area per student, classrooms area per student, sports area per student, and a 
dummy for the presence of a school library. 

5.3 DiD-OB: Decomposition of the impact 

As discussed, the construction of the libraries implied a massive urban development. As 
a result, it is likely the mega-libraries triggered changes in other inputs. For instance, the 
construction of mega-libraries could lead to emigration from the area due to changes in real 
estate prices, also they could change the number of private schools or the teacher-student 
ratio. Thus part of the observed changes between schools close and far from libraries would 
be due to this channel. Hence, we would be interested on see if the program had an impact 
on the inputs and such variation explain part of the outcomes difference, let’s call that 
part Ax, and if there is part of that impact that is not due to them, Ao, instead this 
part of impact could be due to changes on the impact that teachers with high level of 
education could has with the presence of the libraries or could be due to changes in the 
efficiency of public schools who engage with the libraries’ services. In that case, Ao would 
be more likely to be related with the complementarity between schools and libraries. This 
is achieved by implementing a novel strategy, proposed in this study, that introduces the 
12 As the matching is based on discrete categories, the continuous approach cannot be implemented. 
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Oaxaca (1973) and Blinder (1973) decomposition into a DiD context (see the appendix for 
details). The conditions for the identification of the effect are the usual parallel trends of 
DiD but without conditioning on covariates. The decomposition is obtained by applying 
equation 6 . 


Yu — ao + oc\Tit + a- 2 An + ot^X + a^Xn • Tn + cx^Xn ■ An + a^Tn ■ A + a^X ■ Tn ■ An + u (6) 

From this equation, we can define the impact generated by the covariates variation 
(induced by the programme) Ax, and the variation that is unrelated to them, Ao: 

<5 = (E[y\T=l,A = l]-E[y\T = 0,A = l])-(E[y\T=l,A = 0]-E[y\T = 0,A = 0]) 

6 = Ao + A x 

<5 = o;g + (oq + CK 5 + cxy)E[X\T = 1, A = 1] — oi§E\X\T = 0, A = 1]) — cx^E\X\T = 1, A = 0] 

+ a 3 [(E[X\T = 1, A = 1] - E[X\T = 0, A = 1]) - (E[X\T = 1, A = 0] - E[X\T = 0,A = 0])] 

Standard errors are calculated by bootstrapping due to the lack of an analytical expres- 
sion for them. In order to present results by year, the strategy is implemented by comparing 
the pre-intervention period against each treatment-year in a separate regression. 

6 Results and discussion 

6.1 Classic DiD strategy 

First using the parametric approach, we compare the evolution of the treatment group in 
each year from 2003 to 2008 against the pre-treatment period, 2000 to 2002. In Table 
8 , we consider the intensity of treatment to be inversely proportional to the distance. It 
ranges from 1 , the intensity received by a school in front of the library, to 0 , a school that 
is located R1 meters or further. The general impact of being just beside the library implies 
an increase on average scores between 0.02 and 0.06 standard deviations (7tT=1500 for 
2003 and 2008, respectively). This impact is lower when we assume that there is a slower 
decay in the benefit received based on distance (higher Rl), suggesting that the area of the 
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impact is relatively small. However, those impacts are not statistically different from 0. 

Table 9 presents the results from the discrete approach. In Panel A the treatment 
group are those schools between 0 and R1 meters from the libraries and the controls are 
those from R1 to R2 (fixed at 3.5 Km), as shown in the map from Figure 1. Estimates 
range between 0.21 for the lowest ratio in 2005 and -0.05 for the largest. This is consistent 
with the previous specification, which found that the impact is greater for the nearest 
schools. However, there is no evidence of impact different from 0. Similar results are found 
in the last specification, shown in panel B, where the controls are those schools between 
R3 = 2000 and R2. That is, we are not taking into account those schools between Rl and 
R3 meters. These results are also presented in Figure 3, as a reference for comparison. 

Figure 3: Euclidean Distance Estimators 

A. Discrete I B. Discrete II 




C. Continuous: Exponential 



• R1=1500 ♦ R1=2500 

■ R1=3500 


Confidence Intervals at 95% level. 


Equation 5 relaxed the parallel trends assumptions by allowing school-specific trends. 
Figure B.3 shows that for both the discrete specification II and the continuous approaches, 
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schools which are very close to the libraries seems to have a declining trend in the outcome. 
However, that pattern is still statistically non-different to zero. 

One clear concern is the measure of distance. The Euclidean approach might not 
capture the real cost to travel between points in certain contexts. For instance, there 
might be restrictions due to geographic accidents or infrastructure. However, in this urban 
context it might not be a bad approach. An alternative that takes these issues into account 
is road distance, which measures the total distance necessary to reach a mega-library while 
using the road infrastructure. Figure B.4 presents the main estimates using this approach. 
In order to be able to compare both main and additional results, the road distance was 
rescaled using a linear function (see section 3) as the relevant difference might come not 
from the absolute position of each school but from the relative one. The remainder of this 
paper will consider only the Euclidean measure. 

6.2 Synthetic Control 

The next step is to introduce the matching strategy into the DiD. The main objective is 
to ensure that schools which are close to the libraries are compared to similar schools that 
are far from them. In order to achieve this, these schools were matched on the propensity 
score. Figures 4 and 5 show that once the matching weights are introduced, the propensity 
score calculated for the synthetic control group resembles the one of the treated schools 
(according to the treatment definition). The purpose of this step is to ensure that by 
matching the score, the covariates are matched as well. 

We can check the performance of the technique in Tables 10 and 11, for both discrete 
specification I and II respectively. For each distance definition, the tables present the dif- 
ference for each match variable between treatment and control groups before (General) 
and after (Matched) the matching as well as the percentage reduction on the standard- 
ized bias (B.R.). Starts on the tables reflect the results of t-tests for equality of means 
for each difference where the null hypothesis is that the differences are equal to 0. The 
matched results appear balanced, and, giving that we are matching the outcome trend 
before the intervention, the resulting synthetic control group trend closely resembles that 
of the treatment. A graphic representation of this is presented in Figures B.5 and B.6. The 
only one for which the technique does not look as successful is for specification II, where 
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the treatment seems to be following a quite different trend. 


Figure 4: Propensity Score Matching at 2002: Discrete I 

R1=1000 R1=1500 




R1=2000 



Treated Control 

Matched Control 

Kernel: Epanechnikov, Bandwidth: 0.06 


Figure 5: Propensity Score Matching at 2002: Discrete II 

R1=1000 R1=1500 




R 1=2000 



Treated Control 

Matched Control 

Kernel: Epanechnikov, Bandwidth: 0.06 


Apart from the quality of matching, Figures B.5 and B.6 also tell another story. It seems 
that schools which are closer to the libraries have a decreasing trend compared to distant 
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schools which are comparable in key covariates. This is reflected in the DiD estimates 
in Figure 6. In contrast with Figure 3, almost all of the estimates are negative, and, for 
years 2006 and 2007, some of them are significant. In other words, after the libraries 
were constructed, schools nearby, especially those which are very close to the libraries, 
started to perform worse than similar ones not as close to the libraries. This means that 
either the libraries and the urban development in their surroundings did decrease student 
performance relative to their peers 13 or that the identification assumption is not as good 
as desired. 


Figure 6: Matching at 2002 Estimators 


A. Discrete I B. Discrete II 



• R1=1500 ♦ R 1=2500 

■ R1=3500 

Confidence Intervals at 95% level. 


As described before, Figure B.6 for the 1000 meter definition according to specification 
II shows that the declining trend for some of these schools started prior to the construction 
of the libraries which was not fully controlled for by the matching. In order to assess 
this, performance data was de-trended by school (see Equation 5). Figure 7 and Table 12 
present the results of this approach. Estimated coefficients are still negative but are not 
different from zero. 

13 It might be that this schools did perform better, but not as much as to other schools in the city which 
is the base of our standardization. 
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Figure 7: Matching at 2002 Estimators with School-Specific Trends 


A. Discrete I B. Discrete II 



• R1=1500 ♦ R 1=2500 

■ R1=3500 

Confidence Intervals at 95% level. 


6.3 Blinder-Oaxaca Decomposition 

So far it seems that there is no significant variation on the relationship between distance 
to the libraries and average tests scores on mathematics, science and verbal sections. 11 It 
might be the case that the urban transformation was related to changes in inputs in the 
quality of education production function. Table 13 studies this via the Oaxaca-Blinder 
DiD decomposition proposed before, but we should bear in mind that the identification 
assumptions are stronger than in the simple DiD analysis. In most of the cases, it seems 
that the difference between schools far and close to the libraries on test scores due to the 
observed inputs is negative (Ax). The direct impact of the libraries on test scores (Ao) is 
around 0.1 and 0.2 standard deviations for schools located between 0 and 1.5 Km from the 
libraries. As a reference, the difference between students with college graduated mothers 
and the others in the same sample (3.5 Km at most for each library) is 0.6 standard 
deviations. However, these results are not different from 0. 

14 Results for each one of these scores separately are not meaningfully different from the ones presented 
here 
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6.4 Summary 

The fact that estimation procedures with different sets of assumptions provide similar- 
results gives us a good idea of the underlying relationship between the construction of 
mega-libraries and quality of education: there is no evidence of a positive and statistical 
significant impact of the libraries on average standardized scores. We can interpret these 
results in many different ways. First, the fact that the numbers are positive but the 
variance is large could be related to the small number of observations available (around 
190 schools per year). If that is the case, any significant positive relationship between 
public libraries and schools’ scores, is likely to be small. This does not mean that the 
libraries are useless for education: they could improve other skills that are not related 
with tests scores but which are important for the society, such as the availability of safe 
spaces and exposure to cultural activities. Current information makes it impossible to 
test those alternatives. Second, the high variance could be due to the positive impact of 
libraries only on those schools, students or teachers that decided to take advantage of the 
libraries and zero impact on those that did not. Heterogeneous impacts are the rule, not 
the exception, in the literature of educational inputs (Murnane and Ganimian, 2014). 15 
Without further information on the selection mechanism, it is impossible to determine the 
impact only on those schools, students or teachers that are willing to take advantage of 
the public infrastructure. 

In the case that some schools, students or teachers within similar distances to libraries 
use the libraries facilities at different rates, policy may not only be needed to construct and 
run these public facilities but also to impose incentive schemes that induce to use them. 
Glewwe and Kremer (2006) argue that the provision of resources is insufficient to improve 
student performance and the teachers should be instructed in order to maximize the po- 
tential advantage of the resources. Moreover, using the theoretical framework proposed by 
Witte and Geys (2011), the provision of most public goods, in this case the libraries, need 
two stages of policies: the first one for the construction of the libraries, while the second 

15 Murnane and Ganimian (2014) remark three cases: High- and low-education parents responded very 
differently to initiatives to empower school councils in Niger (Beasley and Huillery, 2012); low- and high- 
achieving students derived very different benefits from free textbooks in English in Kenya (Glewwe et al., 
2009); and rural girls did not profit nearly as much as urban boys from the use of LEGO kits to teach 
science in Peru (Beuermann et al., 2013) 
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should work on how these programmatic inputs are transformed into observed and desired 
outputs of education. For instance, prizes for both teachers and students for projects that 
involve the usage of these resources might be relevant. 

7 Conclusions 

We have analyzed the impact on the quality of education, measured by mathematics, 
science and verbal SABER 11 scores, of the construction of two big, public libraries that 
involved the transformation of low-income, urban areas in Bogota, Colombia. To do so, 
we measured how the construction of the libraries could change the test scores of nearby 
schools, controlling for observable variables that are related to students’ performances. We 
opted for a DiD approach to analyze the evolution of the relation of distance-to-library 
and average test scores before and after the public libraries’ introduction at the school 
level. This approach assumes that the effect of the libraries decays with distance and that, 
without the intervention, the relationship would have been unaltered over time. We also 
propose and implement a decomposition of the effect considering the potential variations 
of traditional determinants of quality of education. 

The libraries analyzed are public, education-related infrastructure that is progressive 
in a context of inequality in access to quality school education. Both libraries were built in 
areas populated by the less well-off and where schools have relatively poor facilities. Thus, 
the policy has the potential to boost the equality of opportunities in terms of quality of 
education. However, our findings present non-statistically different from zero impacts of 
the libraries on the average standardized test scores. That is, there is no evidence that 
schools close to the libraries are getting a clear advantage on test scores against those with 
similar characteristics but for their location further from the new public infrastructure. 

It is important to remark that the results are correct only under the validity of the 
assumptions defined in the identification strategy. In general, there are two main scenarios 
in which the assumptions would be invalid. First, if it is the case that the intensity of the use 
of libraries is unrelated to the distance from them. For instance, there could be a network 
of teachers which take advantage of library facilities though their schools are not close to 
the libraries. Another reason could be that the network of medium and small libraries 
communicates perfectly with the more distant mega- libraries, thus there is not difference 
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in access according to the distance. Second, it might be the case that schools close and 
far from the libraries were affected heterogeneously by other events which are not fully 
captured by observed covariates. As an example, patterns of migration or criminality in 
the zones that are near to the libraries which did not affect cohort sizes, gender composition, 
or any other observed inputs with respect to the other neighborhoods could explain those 
results. 

These results do not necessarily mean that libraries do not improve the quality of 
education. On one hand, libraries might be related to skills that are not directly reflected 
in test scores or to these types of skills but for students in older stages of their lives, 
such as college students. We are unable to assess these cases via the present methodology. 
On the other hand, if a direct objective of these types of programs is to enhance test 
scores, our results imply that the policies that introduced these public facilities should be 
complemented with stronger programs which link and coordinate them with the already 
existent educational institutions. The capacity to reach the target (school-students and 
teachers) is an important part of the policy which might require more attention from local 
governments. For instance, prizes for both teachers and students for projects that involve 
the usage of these resources might be relevant. 
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A A. Tables 


Table 1: Travelling time to school 


Time 

Preq. 

Cum. 

Less than 10 min. 

51% 

51% 

Between 10 and 20 min. 

26% 

77% 

Between 20 y 30 min. 

23% 

100% 


Source: DANE Population Census 2005 


Table 2: Average test score by institutional and environment characteristics 


Year 



2000 

2001 

2003 

2004 

2005 

2006 

2007 

2008 

Total 

School day 

Complete 

0.040 

-0.075 

-0.017 

-0.046 

0.020 

0.016 

0.030 

0.051 

0.002 

Morning 

-0.065 

-0.209 

-0.193 

-0.288 

-0.251 

-0.313 

-0.394 

-0.365 

-0.263 

Afternoon 

-0.252 

-0.451 

-0.356 

-0.362 

-0.420 

-0.474 

-0.460 

-0.475 

-0.408 

Total 

-0.082 

-0.234 

-0.174 

-0.217 

-0.197 

-0.235 

-0.245 

-0.234 

-0.204 

Type of school 

Public 

-0.139 

-0.312 

-0.256 

-0.289 

-0.339 

-0.410 

-0.432 

-0.414 

-0.328 

Private 

-0.034 

-0.162 

-0.097 

-0.147 

-0.062 

-0.068 

-0.073 

-0.060 

-0.088 

Total 

-0.082 

-0.234 

-0.174 

-0.217 

-0.197 

-0.235 

-0.245 

-0.234 

-0.204 


Source: Own calculations based on SABER 11 (include imputations). 
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Table 3: Average test score by infrastructure and teaching force 


Year 



2000 

2001 

2003 

2004 

2005 

2006 

2007 

2008 

Total 

Students 

Less than 300 

-0.26 

-0.52 

-0.44 

-0.42 

-0.41 

-0.33 

-0.41 

-0.29 

-0.38 

Between 300-600 

-0.22 

-0.36 

-0.15 

-0.21 

-0.13 

-0.19 

-0.26 

-0.16 

-0.21 

Between 600-1000 

-0.02 

-0.15 

-0.03 

-0.14 

-0.19 

-0.17 

-0.05 

-0.14 

-0.12 

More than 1000 

0.13 

-0.01 

-0.12 

-0.09 

-0.10 

-0.16 

-0.16 

-0.18 

-0.10 

Total 

-0.06 

-0.21 

-0.15 

-0.18 

-0.19 

-0.20 

-0.20 

-0.19 

-0.17 

Teacher-student ratio 

Less than .03 

-0.08 

-0.57 

-0.32 

-0.29 

-0.31 

-0.34 

-0.25 

-0.28 

-0.31 

Between .03-. 04 

-0.06 

-0.16 

0.01 

-0.19 

-0.21 

-0.25 

-0.20 

-0.27 

-0.18 

Between .04-. 05 

-0.11 

-0.21 

-0.00 

-0.18 

-0.00 

-0.04 

-0.17 

0.01 

-0.11 

Between .05-. 06 

0.01 

-0.10 

-0.46 

-0.03 

-0.12 

-0.19 

-0.19 

-0.27 

-0.13 

More than .06 

-0.21 

-0.45 

-0.37 

-0.35 

-0.30 

-0.34 

-0.40 

-0.36 

-0.36 

Total 

-0.08 

-0.23 

-0.17 

-0.22 

-0.20 

-0.23 

-0.25 

-0.23 

-0.20 

Girls-students ratio 

Less than 0.15 

0.15 

0.41 

0.29 

0.11 

0.13 

0.11 

0.03 

-0.03 

0.15 

Between 0.15-0.43 

0.10 

-0.06 

-0.08 

-0.11 

-0.11 

-0.15 

-0.28 

-0.15 

-0.11 

Between 0.43-0.48 

-0.11 

-0.29 

-0.19 

-0.19 

-0.13 

-0.18 

-0.06 

-0.20 

-0.17 

Between 0.48-0.52 

-0.20 

-0.40 

-0.31 

-0.30 

-0.36 

-0.37 

-0.40 

-0.33 

-0.34 

Between 0.52-0.85 

-0.22 

-0.36 

-0.18 

-0.39 

-0.23 

-0.28 

-0.34 

-0.42 

-0.30 

More than 0.85 

0.24 

0.26 

0.33 

0.13 

0.09 

0.01 

0.03 

0.16 

0.16 

Total 

-0.08 

-0.23 

-0.17 

-0.22 

-0.20 

-0.23 

-0.25 

-0.23 

-0.20 

Basic level teachers 

Less than .25 

-0.03 

-0.20 

-0.11 

-0.16 

-0.19 

-0.22 

-0.20 

-0.21 

-0.17 

Between .2 5-. 5 

-0.26 

-0.35 

-0.43 

-0.47 

-0.13 

-0.38 

-0.36 

-0.21 

-0.33 

Between .5-. 75 

0.21 

-0.03 


-0.22 

-0.36 




-0.05 

More than .75 

-0.31 

-0.83 

-0.66 

-0.52 

-0.76 

-0.59 


-0.90 

-0.61 

Total 

-0.06 

-0.22 

-0.15 

-0.20 

-0.19 

-0.23 

-0.21 

-0.21 

-0.19 

Highest Level teachers 

Less than .25 

-0.08 

-0.25 

-0.19 

-0.24 

-0.14 

-0.18 

-0.17 

-0.17 

-0.18 

Between .2 5-. 5 

-0.16 

-0.37 

-0.15 

-0.55 

-0.17 

-0.18 

-0.81 

-0.26 

-0.30 

Between .5 -.75 

-0.12 

-0.28 

-0.19 

-0.27 

-0.33 

-0.32 

-0.41 

-0.34 

-0.29 

More than .75 

-0.04 

-0.20 

-0.08 

-0.18 

-0.11 

-0.30 

-0.24 

-0.32 

-0.19 

Total 

-0.09 

-0.26 

-0.18 

-0.24 

-0.18 

-0.21 

-0.23 

-0.22 

-0.20 


Source: Own calculations based on C600 and SABER 11 (include imputations). 
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Table 4: Average test score by distance 




Years 


Total 

Distance to library 

2000-2002 

2003-2005 

2006-2008 

Less than 1000 

-0.141 

-0.048 

-0.150 

-0.110 

Between 1000-2500 

-0.239 

-0.306 

-0.346 

-0.306 

More than 2500 

-0.112 

-0.140 

-0.177 

-0.147 

Total 

-0.161 

-0.196 

-0.238 

-0.204 


Source: Own calculations based on SABER 11 (include imputations). 


Table 5: 

Schools by distance 


Distance to the 

Schools 

Used in 

library (meters) 


the models 

0-500m 

5 

4 

500m-1000m 

15 

11 

1000m-1500m 

28 

27 

1500m- 2000m 

30 

24 

2000m-2500m 

48 

40 

2500m-3000m 

45 

39 

3000m-3500m 

45 

38 

3500m-4000m 

59 

49 

Total 

275 

232 

Soruce: Own calculations 


Table 6: 

Students by distance 


Distance to the 

Students 

Used in 

library (meters) 


the models 

0-500m 

237 

115 

500m-1000m 

2996 

2888 

1000m-1500m 

5372 

5178 

1500m- 2000m 

5229 

4820 

2000m-2500m 

6322 

5629 

2500m-3000m 

7634 

7032 

3000m-3500m 

6263 

6086 

3500m-4000m 

7685 

7195 

Total 

41738 

38943 

Source: Own calculations 
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Table 7: Distribution by distances and school characteristics 




Distance to the library 



Between 0 and 1 

Km 

% 

Between 1 and 2 

Km 

% 

Between 2 and 4 

Km 

% 

Type of School 

Public 

59.84 

58.96 

50.80 

Private 

40.16 

41.04 

49.20 

Total 

100 

100 

100 

Post-graduated teachers 
ratio 

Less than 30% 

51.18 

61.32 

63.19 

Between 30% y 60% 

25.20 

16.98 

19.93 

More than 70% 

23.62 

21.70 

16.88 

Total 

100 

100 

100 

School day 

Complete 

31.50 

37.26 

42.90 

Morning 

35.43 

28.07 

24.64 

Afternoon 

33.07 

34.67 

32.46 

Total 

100 

100 

100 

Student-teacher ratio 

Less than 20 

20.47 

21.70 

23.84 

Between 20 and 30 

59.84 

58.96 

54.06 

More than 30 

19.69 

19.34 

22.10 

Total 

100 

100 

100 

School size 

More than 1000 students 

39.37 

50.47 

27.90 

Between 500 and 1000 students 

37.01 

25.47 

38.84 

Less than 500 students 

23.62 

24.06 

33.26 

Total 

100 

100 

100 

Gender of the school 

Boys or Girls school 

0 

11.79 

11.67 

Coeducational school 

100 

88.21 

88.33 

Total 

100 

100 

100 
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Table 8: DID Continuous Specification: Exponential 




Estimated values of 8 T from 






\r v - '2008 ct r p i / 

— X/r=2003 * ‘ 1 ( 7 

‘ — t) + PiAt + r]Xit + 7 * + 7 * + &it 



Exponential Specification: For a school of distance di from a library, Tj = 

= ^7 — 1 if di < R1 and T* = 0 if di > R1 

Distance Def 

2003 

2004 

2005 

2006 

2007 

2008 

Rl=1500 

0.03 

- 0.02 

0.06 

0.05 

0.04 

0.06 


(0.07) 

(0.05) 

(0.08) 

(0.09) 

( 0 . 10 ) 

(0.09) 

Rl=2000 

0.02 

- 0.01 

0.03 

0.03 

0.02 

0.04 


(0.04) 

(0.03) 

(0.05) 

(0.06) 

(0.07) 

(0.06) 

Rl=2500 

0.01 

- 0.01 

0.02 

0.02 

0.01 

0.03 


(0.03) 

( 0 . 02 ) 

(0.04) 

(0.04) 

(0.05) 

(0.04) 

Rl=3000 

0.01 

- 0.00 

0.02 

0.01 

0.01 

0.03 


(0.03) 

( 0 . 02 ) 

(0.03) 

(0.03) 

(0.04) 

(0.04) 

Rl=3500 

0.01 

- 0.00 

0.01 

0.01 

0.01 

0.02 


( 0 . 02 ) 

( 0 . 02 ) 

(0.03) 

(0.03) 

(0.03) 

(0.03) 

R2=3500. Standard errors clustered by locality in parentheses. Significance level: 

* 90%, ** 95%, 

*** 99%. 




Table 9: 

DiD Discrete 






Estimated values of 8 T from 






y V~v2008 XT r p 1 / 

Yit — X/r=2003 * Ti ■ 1(7 

' — t) + At + rjXit + 7 * + 7 * + eu 



A. Specification I: 

Schools between 0 and R1 meters 

are treated, T* = 1, and from R1 to R2 meters are controls, 1 

< = o 

Distance Def 

2003 

2004 

2005 

2006 

2007 

2008 

R 1=750 

0.04 

0.02 

0.21 

0.12 

0.10 

0.15 


(0.19) 

(0.17) 

( 0 . 22 ) 

(0.26) 

(0.28) 

(0.26) 

Rl=1000 

0.10 

0.04 

- 0.02 

0.04 

-0.03 

0.02 


(0.13) 

( 0 . 11 ) 

(0.14) 

(0.16) 

(0.16) 

(0.16) 

Rl=1250 

0.10 

0.06 

- 0.02 

0.03 

- 0.02 

0.04 


(0.09) 

(0.08) 

( 0 . 10 ) 

( 0 . 11 ) 

( 0 . 12 ) 

( 0 . 11 ) 

Rl=1500 

0.03 

0.05 

-0.03 

0.04 

- 0.02 

0.05 


(0.07) 

(0.06) 

(0.07) 

(0.07) 

(0.08) 

(0.08) 

Rl=1750 

0.02 

- 0.00 

-0.06 

- 0.00 

-0.04 

0.03 


(0.06) 

(0.06) 

(0.06) 

(0.07) 

(0.07) 

(0.07) 

Rl=2000 

0.02 

- 0.01 

- 0.02 

-0.05 

-0.05 

0.02 


(0.06) 

(0.05) 

(0.06) 

(0.06) 

(0.06) 

(0.06) 

B. Specification II: 

Schools between 0 and R1 meters 

are treated, TJ = 1, and from R3 to R2 meters are controls, r . 

r,: = o 

Distance Def 

2003 

2004 

2005 

2006 

2007 

2008 

R 1=750 

0.06 

0.02 

0.18 

0.09 

0.07 

0.14 


(0.19) 

(0.17) 

( 0 . 22 ) 

(0.26) 

(0.28) 

(0.26) 

Rl=1000 

0.10 

0.03 

-0.03 

0.01 

-0.05 

0.02 


(0.13) 

( 0 . 11 ) 

(0.14) 

(0.15) 

(0.16) 

(0.16) 

Rl=1250 

0.10 

0.05 

-0.03 

- 0.00 

-0.04 

0.03 


(0.09) 

(0.08) 

( 0 . 10 ) 

( 0 . 11 ) 

( 0 . 12 ) 

( 0 . 11 ) 

Rl=1500 

0.03 

0.03 

-0.03 

0.01 

-0.03 

0.04 


(0.07) 

(0.06) 

(0.07) 

(0.07) 

(0.08) 

(0.08) 

Rl=1750 

0.02 

- 0.01 

-0.05 

-0.03 

-0.05 

0.03 


(0.06) 

(0.06) 

(0.06) 

(0.07) 

(0.07) 

(0.07) 

Rl=2000 

0.02 

- 0.01 

- 0.02 

-0.05 

-0.05 

0.02 


(0.06) 

(0.05) 

(0.06) 

(0.06) 

(0.06) 

(0.06) 


R2=3500, R3=2000. Standard errors clustered by locality in parentheses. Significance level: * 90%, ** 95%, *** 99%. 
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Table 10: Balance Status after Matching: Discrete I 
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Avg Std Test Score: 2001 0.06 -0.00 93.2 0.04 -0.02 21.2 0.09 0.02 79.3 

Avg Std Test Score: 2002 -0.09 0.01 90.5 -0.09 0.00 98.8 -0.01 0.01 -25.9 

Significance level for t-tests for equality of means: * 90%, ** 95%, *** 99% 



Table 11: Balance Status after Matching: Discrete II 
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Avg Std Test Score: 2001 0.08 0.01 86.8 0.06 -0.03 39.4 0.09 0.02 79.3 

Avg Std Test Score: 2002 -0.08 0.03 47.4 -0.07 -0.03 42.5 -0.01 0.01 -25.9 

Significance level for t-tests for equality of means: * 90%, ** 95%, *** 99% 



Table 12: DiD Discrete after Matching Including School-Specific Trends 




Estimated values of 8 T 

from 




Y it = 

X2r=2003 ' l( r — + Pi At + rjXit + 7i + 7t + Wit • 7* 

+ Git 


A. Specification I: 

Schools between 0 and R1 meters 

are treated, T) 

= 1, and from R1 to R2 meters are controls, 1 

1 = 0 

Distance Def 

2003 

2004 

2005 

2006 

2007 

2008 

R 1=750 

-0.23 

-0.22 

0.02 

-0.14 

-0.29 

-0.09 


(0.15) 

(0.25) 

(0.24) 

(0.38) 

(0.51) 

(0.61) 

Rl=1000 

-0.12 

-0.16 

-0.20 

-0.24 

-0.41 

-0.24 


(0.11) 

(0.15) 

(0.20) 

(0.23) 

(0.28) 

(0.35) 

Rl=1250 

-0.08 

-0.11 

-0.22 

-0.21 

-0.33 

-0.27 


(0.11) 

(0.16) 

(0.22) 

(0.26) 

(0.29) 

(0.35) 

Rl=1500 

-0.06 

-0.04 

-0.14 

-0.10 

-0.19 

-0.13 


(0.09) 

(0.11) 

(0.15) 

(0.18) 

(0.21) 

(0.24) 

Rl=1750 

-0.06 

-0.06 

-0.16 

-0.13 

-0.19 

-0.14 


(0.09) 

(0.11) 

(0.14) 

(0.17) 

(0.19) 

(0.23) 

Rl=2000 

-0.06 

-0.04 

-0.08 

-0.12 

-0.14 

-0.05 


(0.08) 

(0.11) 

(0.14) 

(0.17) 

(0.20) 

(0.23) 

B. Specification II: 

Schools between 0 and R1 meters 

are treated, T) 

; = 1, and from R3 to R2 meters are controls, j 

r , = 0 

Distance Def 

2003 

2004 

2005 

2006 

2007 

2008 

Rl=750 

-0.27 

-0.15 

0.01 

-0.09 

-0.24 

-0.01 


(0.20) 

(0.32) 

(0.29) 

(0.46) 

(0.59) 

(0.71) 

Rl=1000 

-0.20* 

-0.24 

-0.39 

-0.45 

-0.56* 

-0.42 


(0.11) 

(0.18) 

(0.26) 

(0.28) 

(0.32) 

(0.39) 

Rl=1250 

-0.05 

-0.09 

-0.25 

-0.23 

-0.35 

-0.28 


(0.13) 

(0.17) 

(0.23) 

(0.27) 

(0.30) 

(0.34) 

Rl=1500 

-0.07 

-0.04 

-0.18 

-0.14 

-0.22 

-0.14 


(0.09) 

(0.12) 

(0.17) 

(0.20) 

(0.22) 

(0.25) 

Rl=1750 

-0.07 

-0.04 

-0.13 

-0.11 

-0.18 

-0.12 


(0.09) 

(0.11) 

(0.15) 

(0.18) 

(0.21) 

(0.24) 

Rl=2000 

-0.06 

-0.04 

-0.08 

-0.12 

-0.14 

-0.05 


(0.08) 

(0.11) 

(0.14) 

(0.17) 

(0.20) 

(0.23) 

R2=3500, R3=2000. Standard errors clustered by locality 

in parentheses, 

, Significance level: 

* 90%, ** 95%, *** 99%. 
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Table 13: BO-DD Discrete 


Blinder-Oaxaca decomposition of the treatment effect: delta — 
5 : Total impact 

Ax : Impact due to variation on covariates 

Aq : Impact due to other channels 

Ao 4- Az 




Treated / Controls 

2003 

2004 

2005 

2006 

2007 

2008 

Rl— 750 5 

0.2008 

0.2252 

0.4293 

0.3343 

0.1444 

0.1726 

10/182 

(0.3981) 

(0.3644) 

(0.4220) 

(0.5306) 

(0.4816) 

(0.4339) 

A 0 

0.0310 

0.2346 

0.5356 

0.4517 

0.3101 

0.3223 


(0.4573) 

(0.3555) 

(0.4093) 

(0.4972) 

(0.4399) 

(0.4144) 

Ax 

0.1698 

-0.0094 

-0.1063 

-0.1174 

-0.1657 

-0.1497 


(0.1710) 

(0.0985) 

(0.1055) 

(0.1137) 

(0.1078) 

(0.1005) 

Rl— 1000 6 

0.1626 

0.1460 

0.0851 

0.1524 

-0.0009 

0.0465 

19/173 

(0.2353) 

(0.2138) 

(0.2474) 

(0.2810) 

(0.2593) 

(0.2562) 

Ao 

0.1253 

0.1787 

0.2083 

0.2776 

0.1597 

0.2296 


(0.2616) 

(0.2075) 

(0.2410) 

(0.2625) 

(0.2414) 

(0.2469) 

Ax 

0.0373 

-0.0327 

-0.1232 

-0.1252 

-0.1605 

-0.1831* 


(0.1308) 

(0.1149) 

(0.1120) 

(0.1241) 

(0.1073) 

(0.1080) 

Rl=1250 8 

0.1308 

0.1056 

0.0286 

0.0639 

-0.0229 

0.0463 

27/165 

(0.1595) 

(0.1489) 

(0.1789) 

(0.1891) 

(0.1787) 

(0.1774) 

A„ 

0.1360 

0.1890 

0.1740 

0.2745 

0.2032 

0.2613 


(0.1895) 

(0.1644) 

(0.1890) 

(0.2016) 

(0.1754) 

(0.1850) 

Ax 

-0.0051 

-0.0833 

-0.1455 

-0.2107 

-0.2262** 

-0.2149* 


(0.1091) 

(0.1158) 

(0.1079) 

(0.1331) 

(0.1118) 

(0.1173) 

Rl=1500 8 

0.0560 

0.0840 

0.0115 

0.0790 

-0.0202 

0.0637 

44/148 

(0.1194) 

(0.1145) 

(0.1239) 

(0.1245) 

(0.1430) 

(0.1270) 

Ao 

0.0535 

0.1459 

0.1349 

0.2530 

0.1201 

0.1831 


(0.1390) 

(0.1278) 

(0.1330) 

(0.1586) 

(0.1466) 

(0.1568) 

Ax 

0.0026 

-0.0619 

-0.1235 

-0.1740 

-0.1403 

-0.1193 


(0.0990) 

(0.1230) 

(0.1052) 

(0.1341) 

(0.1155) 

(0.1303) 

Rl— 1750 6 

0.0437 

0.0320 

-0.0172 

0.0379 

-0.0237 

0.0574 

52/140 

(0.1081) 

(0.1109) 

(0.1106) 

(0.1213) 

(0.1337) 

(0.1144) 

Ao 

0.0515 

0.0611 

0.0967 

0.2114 

0.1084 

0.1779 


(0.1211) 

(0.1157) 

(0.1204) 

(0.1469) 

(0.1370) 

(0.1436) 

Ax 

-0.0078 

-0.0291 

-0.1139 

-0.1735 

-0.1321 

-0.1206 


(0.0906) 

(0.1131) 

(0.0958) 

(0.1302) 

(0.1204) 

(0.1273) 

Rl— 2000 S 

0.0208 

-0.0132 

-0.0178 

-0.0446 

-0.0722 

0.0125 

70/122 

(0.1074) 

(0.1079) 

(0.1071) 

(0.1175) 

(0.1178) 

(0.1107) 

A„ 

0.0230 

0.0657 

0.0530 

0.1115 

0.0862 

0.1253 


(0.1206) 

(0.1033) 

(0.1103) 

(0.1387) 

(0.1334) 

(0.1350) 

Ax 

-0.0021 

-0.0789 

-0.0708 

-0.1561 

-0.1584 

-0.1128 


(0.0853) 

(0.1138) 

(0.0952) 

(0.1292) 

(0.1195) 

(0.1343) 


R2=3500. Clusters by locality standard errors in parentheses 
*** p<0.01, ** p<0.05, * p<0.1 
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B Figures 


Figure B.l: Euclidean vs Road distances 



Euclidean distance ♦ Road distance 
— Fitted linear relation 

Source: Own calulations base on OSM roads network 
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Figure B.2: Distance and scores relationship 
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Coefficient estimate Coefficient estimate 


Figure B.3: Euclidean Distance Estimators with School-Specific Trends 


A. Discrete I B. Discrete II 




C. Continuous: Exponential 



• R1=1500 ♦ R1=2500 

■ R1=3500 

Confidence Intervals at 95% level. 
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Coefficient estimate Coefficient estimate 


Figure B.4: Road Distance Estimators 


A. Discrete I B. Discrete II 




C. Continuous: Exponential 



• R1=1500 ♦ R1=2500 

■ R1=3500 

Confidence Intervals at 95% level. 
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Figure B.5: Matching test Scores Evolution : Discrete I 


R1=1000 R1=1500 





Figure B.6: Matching test Scores Evolution : Discrete II 


R1=1000 R1=1500 
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C Appendix: Oaxaca-Blinder and DiD 


Here we propose a new identification strategy that mix the advantages of BLinder Oaxaca 
decomposition with the DiD specification. The Blinder (1973) and Oaxaca (1973) proce- 
dure allows to decompose the difference of a variable y between two groups, S = E[y\T = 
1] — E[y\T = 0], by the difference on observed characteristics x, A x , and a difference 
that is not related to them Ao- Here we assume a linear relationship between observed 
characteristics x and the outcome y which can be specific to the group T. 

y = A) + P\x + fa T + /3 3 T • x + e 2 

If we impose E\e 2 \T = 1] = E[e 2 \T = 0], the difference 5 can be expressed on terms of 
the difference on x between the two groups and a remainder. 

5 = E[y\T = 1} - E[y\T = 0} 

= [Po + P 2 + (P i + fa)E[x\T = 1]] - [A, + PiE[x\T = 0]] 

= P 2 + {Pi + p 3 )E[x\T = i\-p 1 E[x\T = 0] 

= {p 2 + P 3 E[x\T = 1]} + {Pi(E[x\T = 1] - E[x\T = 0])} 

= { Ao} + {A x } 

We define A^ = Pi(E[x\T = 1] — E[x\T = 0]), as the difference for being part of 
T = 1 and not of T = 0 on x. The other term, Ao = P 2 + P 3 E[x\T = 1], reflects the 
difference on y which is not explained due to the difference on x. In empirical labour 
economics, these former term was usually interpreted as the ‘discrimination’ for being part 
of T = 1. Under the framework of treatment effects literature, where T is a treatment that 
has a heterogeneous effects according to x, so the ‘unexplained’ component is an average 
treatment on the treated (Fortin et al., 2011). 

We propose a Difference-in-Differences (DiD) analogue of the decomposition, where we 
can understand which part of the variation is explained by the impact on an observed 
channel x. In the case of our program, we would like to understand which part of the effect 
is due to an enhancement of the results of schools via the increase on certain inputs, and 
what is due to a general impact that is not related to them. To the best of our knowledge 
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this is the first paper that implements this decomposition. 

Let’s assume that we can observe two periods, A € {0,1}. Given it, we define the 
average treatment on the treated estimator: 

<5 = (E[y\T = 1,A = 1}- E[y\T = 0,A = 1]) 

- (E[y\T = l,A = 0]-E[y\T = 0,A = 0]) 

This is the classical DiD estimator under the usual parallel trends assumption. It could be 
retrived by using the traditional specification, 

y = 7/o + rjiT + r] 2 A + 5T ■ A + e 

Now, let’s assume that part of this impact is due to a variation on a particular variable 
x that is affected by the treatment. Our decomposition is able to decompose the treatment 
effect of T on Y between the impact on the observed channel, A x and the impact via other 
channels, Aq. It can be implemented using the following linear equation: 


y = a o + a\T + a 2 A + a^x + a^x ■ T + a$x ■ A + a§T ■ A + ajx ■ T ■ A + u 


Given that 


E[y\T = 0,A = 0] 
E[y\T = l,A = 0] 
E[y\T = 0,A = l] 
E[y\T=l,A = l] 


ao + a^E[x\T = 0, A = 0] 

ot o -\- ol\ T ( 0:3 o: 4 )L/[x| / T = 1, A = 0] 

«o + «2 + («3 + a 5 )E[x\T = 0, A = 1] 

ol 0 T ol\ + oi 2 + uq T ( 0:3 T ol 4 - j - 0:5 + olt)E\x\T = 1 , A = 1 ] 


The impact <5 is decomposed between the variation on x that is correlated with the treat- 


46 



ment implementation, A^, and the variation that is explained due to other channels, Aq. 


5 — ((c^o + ol\ + oc 2 + ol g + (a 3 + CK 4 + 0:5 + a^)E[x\T — 1, A — 1]) 

- (a 0 + «2 + («3 + a 5 )E[x\T = 1, A = 1])) 

- ((ao + ai + («3 + a A )E[x\T = 1, A = 0]) - (ao + azE[x\T = 0, A = 0])) 

6 = «6 + («4 + «5 + ai)E[x\T = 1, A = 1] — a 5 E[,T|T = 0, A = 1]) — a^E[x\T = 1, A = 0] 

+ a 3 [(E[x\T = 1, A = 1] - E[®|T = 0, A = 1]) - (E[x|T = 1, A = 0] - E[x\T = 0, A = 0])] 

= Ao + A iT 

Hence, the impact on Y due to T that can be explained by the impact of T on X is: 

A* = a 3 [{E[x\T = 1,A = l]-E[x\T = 0, A = l])-(E[x\T = 1, A = 0]-E[x|T = 0,A = 0])] 
And the remainder variation 

Aq = + («4 + «5 + aj)E[x\T = 1, A = 1] — a^E[x\T = 0, A = 1] — a±E[x\T = 1, A = 0] 
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